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Abstract 

Scarcity of frequencies combined with the demand for more bandwidth is likely to increase the need for devices that 
provide high wireless bandwidth in limited areas while using a wired network to carry data over longer distances. 
Examples of such devices are Wi-Fi routers and femtocells and future devices that use TV whitespace. To utilize 
the available frequencies more efficiently radios must be able to find other users of the frequency bands and adapt 
so that they are not interfered. As transmitters hundreds of kilometers away may cause as much interference as a 
transmitter located next door, this mechanism can not be based on location alone. Central databases can be used for 
this purpose, but with thousands or millions of radio devices to coordinate a centralized system may not always be 
ideal. In this paper, we propose a decentralized protocol and architecture for discovering interfering radio devices 
over the Internet. The protocol has low bandwidth-, memory- and processing requirements, making it suitable for 
platforms with limited resources. We evaluate the protocol through simulation in network topologies with up to 1 000 
000 nodes, including topologies generated from three municipalities in Norway. We also describe a proof-of-concept 
resource allocation algorithm and prove that it converges with information gathered by the discovery protocol. 



1 Introduction 

Many radio transmitters we use daily are connected to 
the Internet. Wi-Fi routers are a typical example, but re- 
cently other devices, such as femtocells have appeared. 
The scarcity of frequencies combined with the demand 
for more bandwidth is likely to further increase the need 
for devices that provide high wireless bandwidth locally 
while using a wired network to carry data over longer 
distances. Locally available frequencies can be utilized 
more efficiently by enabling these devices to coordinate 
with other radio devices which could be interfered or in- 
terfere in their area. 

FCC has proposed using a database for discovering 
available frequencies in the US. The database will con- 
tain areas where it is safe to use radio transmission in part 
of the white space TV frequencies. The system is dimen- 
sioned to take care of the TV viewers without knowing 
their location by making worst case assumptions. How 
the database is to be accessed is about to be defined by 
the Protocol to Access White Space database [22 ]. How- 
ever, this system is not designed to allow fine grained 



discovery due to the high resource requirements in large 
networks. 

As an example of the requirements of a more fine 
grained system, we could assume that a central database 
was configured to provide accurate information within 
one hour. Each radio node would have to update the 
database within this time, either to request new frequen- 
cies or to renew an existing lease. Assuming a relatively 
modest network of 100000 nodes this would require ap- 
proximately 27 updates to the database per second. Each 
update can potentially trigger a time consuming domino 
effect in resource allocation, especially in dense areas of 
the network. A solution is to make the update interval 
long enough to allow processing to take place, but previ- 
ous work have shown that if accurate information about 
receivers is known, significantly more bandwidth in Hz/s 
can be achieved [20|[7|. In other words, long update in- 
tervals may decrease the effectiveness of the system sig- 
nificantly. 

Cognitive Radio (CR) and Dynamic Spectrum Access 
(DSA) are technologies that can help alleviate the com- 
ing spectrum shortage. So far, most research has been 



1 



focused on physical layer and MAC layer capabilities of 
such systems, and recently some standards (such as IEEE 
802.22 [3 1) have emerged. When investigating physical 
layer performance of CR and DSA networks one usually 
assumes a network consisting of 10-50 nodes. However, 
a network of radio devices connected to the Internet will 
consist of thousands to millions of nodes distributed over 
large areas, even countries. With this vast amount of dif- 
ferent radios it is important to be able to find other radios 
to communicate with and also radios one needs to coor- 
dinate traffic with. 

A challenge in large centralized DSA systems is dis- 
tributing work load between multiple database servers. 
Radio transmitters may interfere or be interfered by other 
nodes that are far away, making it difficult to divide re- 
sponsibility in geographical areas. When changes occur 
in the network, the time it takes to synchronize and co- 
ordinate the database servers affects the accuracy of the 
response. The longer the client must wait, the less accu- 
rate the response becomes. Having a distributed system 
could reduce or remove the need for synchronization be- 
tween servers. A hybrid system could be designed so 
that centralized databases are used in clearly defined ge- 
ographical areas while a decentralized protocol provides 
information outside these areas as a fallback. A decen- 
tralized protocol may also be used to discover available 
databases in the area one is in, leaving the radio user 
responsible for contacting the available (uncoordinated) 
servers. 

Another argument for distributed systems is that they 
are more resilient and robust to failure. If extreme situa- 
tions should arise, such as natural disasters, a centralized 
system is more vulnerable than a decentralized one. In 
less extreme situations, such as during network partition- 
ing or power outages, a distributed system would be able 
to continue to function even if large parts of the network 
are unavailable. In a hybrid system the decentralized pro- 
tocol could become a fallback mechanism while the cen- 
tral server is unreachable. To the best of our knowledge, 
a large-scale distributed protocol for DSA has not been 
previously described in the literature. 

In this paper, we propose and implement a decen- 
tralized peer-to-peer protocol for large-scale discovery 
of radio devices over the Internet. The protocol has 
low bandwidth-, memory- and processing requirements, 
making it suitable for running on platforms with limited 
resources, such as future Wi-Fi routers and femtocells. 
We evaluate the protocol through simulation in large net- 
work topologies with up to 1 000 000 nodes, including 
topologies generated from population patterns in three 
municipalities in Norway. We also describe a proof-of- 
concept resource allocation algorithm and prove that it 
converges with information gathered by the discovery 
protocol. Finally, we propose a generic architecture for 




Figure 1: Nodes A, B, C and D with coordination ranges. 

development of new discovery mechanisms and resource 
allocation algorithms in large hybrid- or fully distributed 
systems. 

DSA is a complex problem and although there is a lot 
of existing work in this area there are few proposals for 
complete systems. Some of the components described 
in this paper are adapted from approaches published on 
their own in earlier works, but a key contribution of this 
paper is the sum-of-the-parts. 

This paper is organized as follows. In Section [2] the 
discovery protocol is presented. The architecture is de- 
scribed in Section [3] Section [4] describes our adapted 
resource allocation algorithm. In Section[5] both the dis- 
covery protocol and the resource allocation algorithm are 
evaluated. Section|7]concludes the paper. 

2 Discovery protocol 

We assume that the radio nodes are connected to the In- 
ternet and have a known geographical position. The po- 
sition is obtained by a location service, such as GPS, 
or by letting the user enter a street address or location. 
Around each node we define a coordination area, an area 
in which the node may interfere with others. For simplic- 
ity, we assume that this area is circular and that we are 
using omnidirectional antennas. 

As an example, Figure [T] shows nodes A, B, C and D 
with coordination ranges. We can see that nodes A and 
B may cause interference in the same area and should 
therefore know about each other. C on the other hand, is 
outside of both A and B's radio ranges and can safely be 
ignored by these nodes. D is a strong radio transmitter 
and interferes with all the other nodes. From this exam- 
ple we can see that although D is farther away from A and 
B than C, D is much more important in terms of resource 
allocation. The goal of the protocol is thus to enable each 
node to discover all other nodes which have coordination 
areas overlapping their own, i.e., other nodes which may 
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Figure 2: Overview of the discovery protocol. 



interfere in the same area as themselves. We call nodes 
which have overlapping coordination ranges candidate 
nodes, as they are candidates for consideration by the re- 
source allocation algorithm. 

The discovery protocol is based on an unstructured 
peer-to-peer (P2P) overlay. The protocol has two mech- 
anisms, as presented in Figure [2] The first mechanism 
provides a random sample of all nodes participating in 
the network. The second mechanism selects the most 
important nodes from the random sample and exchanges 
information with them. We now describe the protocol in 
more detail, as well as the function we use to determine 
the importance, or utility, of each node. 

2.1 Random sampling 

In order to provide an approximate random sample of 
the network we use a gossip-based peer sampling ser- 
vice 02). In our implementation, we have used the 
Newscast protocol ifTTI . but other peer sampling services 
could also be used. The Newscast protocol is a generic 
protocol which maintains a table of N known news items. 
Each news item is a data object associated with a times- 
tamp and the IP address of the node that produced it. The 
timestamp is used to discard the oldest items. 

At periodic intervals, Newscast selects a source node 
at random from its table of news items. The full table 
of N entries is then sent to the selected node. At the top 
of the table its own news item is added with an updated 
timestamp. The selected node then replies with its own 
table and its own news item. After the exchange, the 
tables in both nodes contain 2N entries. To reduce the 
length of the table it is sorted by timestamp. The oldest 
timestamps are then deleted until the length is N. A new 



Identifier 


8 bytes 


Overlay node ID of 
the source node 


Location 


16 bytes 


Geographical loca- 
tion of the source 
node 


Coordination 
range 


8 bytes 


Coordination range 
of the source node 


IPv4 or 
IPv6 address 


16 bytes 
4 bytes 


Source IP 


Timestamp 


8 bytes 


When the news item 
was produced 



Table 1 : Fields included with each news item. Length is 
56 bytes with IPv6 addressing. 



random node is selected and the process is then repeated. 
This mechanism ensures that nodes have a near random 
set of other nodes participating in the P2P network and 
that obsolete information expires over time. 

To enable discovery based on location, we add several 
fields to the data object associated with each news item. 
These fields are: A randomly assigned source node iden- 
tifier, the geographical location of the source node, and 
its coordination range. The source node identifier is used 
to enable multiple devices to use the same IP address, 
while the location and coordination range are used for 
discovery. 

Table 



2.1 contains the complete list of fields dis- 



tributed through the random sampling mechanism. 



2.2 Utility function 

The nodes we must coordinate with are other nodes 
which may interfere with us or which may be interfered 
by us, as illustrated in Figure [T] This cannot be solved 
with a regular distance function, as nodes far away may 
interfere with us, while a node right next to us could be 
using very low power and not interfere at all. 

To determine which nodes we should coordinate with 
we define a utility function based on the overlap be- 
tween coordination ranges. If the sum of the coordina- 
tion ranges of two nodes is higher than the distance be- 
tween the nodes, their coordination areas overlap. This 
can be expressed as a function by dividing the square of 
the sum of the coordination ranges by the distance be- 
tween the nodes. This is shown in EquationjlJ where Xjj, 
yu, Zij and cry are the location and coordination ranges 
of nodes i and j, respectively. When the areas are over- 
lapping /() is higher than 1, while with no overlap the 
result is less than 1 . 
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2.3 Important nodes 

Important nodes are those discovered so far that have the 
highest utility. When new nodes are discovered using 



the random sampling technique described in Section 2. 1 



their utility is calculated using Equation [T] The nodes 
are then added to the table of important nodes. This table 
has a fixed length M and the entries are sorted by the 
result of the utility function. When adding new nodes 
to a full table, the nodes at the end of the table (with 
lowest utility) are removed. The mechanism ensures that 
given enough time, the M nodes which have the highest 
utility are discovered. However, waiting for all candidate 
nodes to be discovered randomly can take time in large 
networks. 

As the utility function is based on distance, it is likely 
that a node with high utility has information about other 
nodes of interest to us in the same area. By periodically 
exchanging important nodes in a similar manner to the 
random sample, the discovery time can thus be reduced 
significantly. At random intervals, we therefore select 
a random node from the table of important nodes. The 
node is selected from the top 10 nodes with the highest 
utility, or all nodes with utility > 1 if we have found more 
than 10 nodes with overlapping coordination ranges. Us- 
ing the utility function, we then select K nodes from our 
list which are most useful to the selected node, where K 
is lower or equal to M. We proceed to send this list to 
the other node. In return, the other node replies with a 
list of the K nodes which are the most interesting for us. 
Finally, both nodes merge the list of K nodes with their 
table of M entries, deleting the entries with lowest utility. 

If the number of overlapping nodes exceeds M, i.e. a 
node has more than M nodes with a utility higher than 
1, the nodes are not able to discover all their candidate 
nodes. By considering all entries with utility larger than 
1 as equal and sorting them by age, we can delete the 
oldest entries when the table is full. The random sam- 
pling mechanism then ensures that all nodes will eventu- 
ally be discovered, but their entries are not necessarily in 
the table at the same time. When M can be chosen such 
that it is higher than the maximum number of nodes with 
utility > 1, it leads to the lowest convergence time. 

By allowing K to be shorter than M, the bandwidth 
requirements of the protocol can be kept low, even if M 
is large. 



2.4 Hardware, memory and bandwidth 
considerations 

The memory requirements of the protocol are bounded 
by size N of the random sample and the number of re- 
membered nodes M, as well as the size of each news item 



(see Table 2. 1 1. As we also need buffers for receiving up- 
dates from other nodes, the total memory requirement is 
approximately 2N + M + K. In the evaluation in Section 
[5] we see that the protocol reaches a stable state with rel- 
atively low values for N, M and K, 40 and 600 and 100 
respectively. If we assume that each news item is close 
to 56 bytes long, the total memory requirements for stor- 
ing the tables would be 43680 bytes. This should make 
the protocol suitable for implementation in modest hard- 
ware. 

The bandwidth consumption is determined by the 
length of the periodic interval, the size of N and the size 
of K. If we assume N = 40 and K — 100, the data sent at 
each interval would be 7840 bytes. This is small enough 
to fit in a single UDP packet. With a periodic interval of 
15 seconds, the average amount of data sent from each 
client would be approximately 0.5 kilobit per second. 
In terms of Internet traffic where traffic is measured in 
megabits, this is fairly low. By decreasing K or having a 
longer periodic system update interval, the average band- 
width consumption can be reduced at the expense of a 
less responsive system. 

As the maximum number of nodes with utility > 1 
is usually unknown and may vary greatly from node to 
node, it is useful to be able to adjust the length of the 
table of important nodes dynamically. This can be ac- 
complished by initially creating a table in memory with 
room forM + A' entries, where M = K. This ensures that 
when K entries are received from another node, there is 
always room in memory to store them. After receiving 
new information, nodes with utility < 1 are discarded un- 
til the number of items in the table is back to M or there 
are no more items to discard. If the number of items with 
utility > 1 exceeds M, M is increased with K entries, i.e. 
M new = M [ c i + K. This effectively increases M in steps 
of K, while always keeping room for additional K en- 
tries. As K is constant, the bandwidth consumption does 
not increase. The algorithm requires few memory allo- 
cations, as the entries in the table can be reused as long 
as M is not increased. 

3 Architecture 

In the following we propose a generic architecture for 
nodes participating in a distributed DSA system. The 
architecture provides a separation of concerns for node 
discovery and resource allocation, which may aid the de- 
velopment of new solutions in this area. 
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Figure 3: Discovery architecture overview. 



We assume that each node is able to perform three 
main tasks: a) gather information about other nodes, b) 
perform resource allocation based on this knowledge and 
c) configure its hardware to use the allocated resources. 
These tasks may either be performed by the node itself 
or by another node on its behalf, e.g. by a centralized 
system. 

Figure [5] shows the relationship between the three 
tasks and the information that must be passed between 
them. The first task is discovery. In this task, information 
is gathered about other nodes. The information can be re- 
trieved with the help of a distributed protocol or a central 
database. The main goal of this task is to select a set of 
candidate nodes that must be taken into account during 
the resource allocation. This is the task that is performed 
by the discovery protocol proposed in this paper. Note 
that several mechanisms can be used simultaneously. For 
example, a database could be used to identify candidate 
nodes that are not running the distributed protocol. 

When a node has selected a set of candidate nodes, it 
must begin to allocate resources. This is handled by a 
separate abstraction we refer to as the resource alloca- 
tor. The main responsibility of the resource allocator is 
to execute a resource allocation algorithm and to provide 
support functions needed by the algorithm. As we can 
see, it is up to the allocator to gather additional informa- 
tion after the candidate nodes have been identified. The 
specific communication protocol is outside the scope of 
this paper, but we assume that it is able to contact other 
nodes directly via a network connection or via a system 
representing them. The resource allocator is also respon- 
sible for configuring the radio transceiver, which is the 
final task. 

The discovery protocol proposed in this paper is 
generic and does not take passive receivers into account. 
To discover passive receivers a database could be added 



to the discovery tasks. Alternatively, a two-step process 
could be implemented by using the distributed protocol 
to discover databases, as described in the following. The 
server running the database joins the P2P system and reg- 
isters a coordination area which covers all the nodes it 
represents. As an example, we could assume that a TV 
broadcaster has been added to the network. The broad- 
caster has a known location and a very large coordination 
area, covering all its TV receivers. When the database 
is discovered by other nodes, the resource allocators in 
these nodes may choose to either contact the server and 
ask whether a channel is available or to avoid using the 
frequencies. If the node asks for access, it is up to the 
owner of the database to allow or deny it. This also al- 
lows the owner to require payment for access to certain 
channels, giving an economic incentive for providing the 
service. 

Wireless microphones can be solved similarly to TVs, 
in that a database node registers itself in the area where 
the microphones are operating. Initially, the node would 
negotiate a frequency to be used by the microphones. 
The node's resource allocator would then be responsi- 
ble for not allowing others to use the frequency while the 
microphones are operating. 



4 Choosing A Resource Allocation Algo- 
rithm 

As the main motivation behind cognitive radio and dy- 
namic spectrum access systems is increased spectral effi- 
ciency, smart resource allocation schemes are necessary 
to realize their potential. Considering operation in the 
TV bands, where secondary operation is allowed based 
on location with a maximum transmit power constraint, 
the problem is reduced to a resource sharing problem 
among peers in a multichannel interference network. In 
general one can model the interference in two ways: in- 
terference graphs or the physical SINR model. Optimal 
resource allocation is NP-hard in both cases. However, 
the inefficiency of interference graph-based models have 
been analyzed quite extensively |9|| 18 1 and they are not 
suitable to capture the accumulated interference which 
is a major issue in large-scale networks. We therefore 
model interference from the physical SINR model ifTTll 
which accounts for accumulated interference. 

Under the physical SINR model there are two main 
objectives: convergence to equilibria (stable-states) and 
high performance at these steady states. 

A popular approach is to use game theory for conver- 
gence results. Note that the difficulty of this approach 
in this paper is due to the large-scale network as conver- 
gence results for small scale networks with knowledge 
of all nodes have been presented in ll20ll . Investigating 
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the performance of the steady states is more difficult. 
Recently it has been shown that under geometric signal 
propagation, greedy centralized algorithms can achieve 
constant factor approximations to optimality |14|. The 
drawback of these constant factor approximations is the 
lack of fairness in the algorithms and the result is that 
some nodes must stay silent. A metric that has been 
shown to yield good performance is overall system in- 
terference [20 1 [19]. With this as the goal of the algo- 
rithm, each node tries to minimize its impact on the sys- 
tem interference while still achieving some SINR or rate 
requirement. 

The discovery protocol proposed in this paper aims 
to provide each node with local knowledge of other 
transceiver nodes in the area. We propose a simple re- 
source allocation scheme which assigns one channel to 
each transmit node and adjusts power to satisfy some 
SINR or rate requirement based on this knowledge. The 
algorithm is a modified version of one presented in [20|. 
By adding power control we show that the algorithm 
has desirable convergence properties even without global 
knowledge. Note that our proposed algorithm is only an 
example of a resource allocation for this system. We 
encourage others to propose new allocation algorithms 
which can be tested against our topology, as this is made 
public H|. 



4.1 Algorithm 

The main idea of the algorithm is for each transmitter to 
make a balanced decision between interference at its own 
receiver and that which it generates to other receivers. 
Through the discovery mechanism presented in the pre- 
vious section each transmitter has obtained knowledge 
about its candidate nodes, i.e. nodes that should be con- 
sidered by the algorithm. 

Let 3? be set of all transmit nodes and let Si, be the 
set of all receiver nodes. For simplicity we assume that 
transmit node i wants to transmit to receiver node i, and 
we call this link user i. We assume the users can choose 
one channel from a channel set ^ to transmit on. The 
SINR at user / over channel k is given as 



SINRi(k) = 



gii(k)pi(k) 



?u(k)pi(k) 



Na+Lj?igji(k)P](k) N +Ii(k) 



(2) 



where gij(k) is the channel gain from transmit node ; 
to receive node j which may or may not depend on the 
channel, pi(k) is user i's transmit power on channel k and 
Nq is the power of the thermal noise. The goal of each 
user is to select a channel such that its SINR requirement, 
Pi, is satisfied. This means that we find a channel k* such 
that SINRj(k*) > j3,. User i's necessary power to achieve 



its SINR requirement on channel k is 

ft(A/b +//(*)) 



P? c {k)> 



?n(k) 



(3) 



Let S?i be the set of receiver nodes known by user i. 
Note that from the P2P discovery mechanism \Sf\\ < M. 
The frequency allocation utility function we propose is 
the following 

Ui(k)=- £ Pj(k)g }i (k) 

I Pr(k) gi j(k)l( Pj ik)) (4) 

where l(pj(k)) = 1 if Pj(k) > and otherwise. The 
first summation is the interference observed at user i's 
receiver and the second summation is the interference 
generated by user i to user i's candidate nodes. We as- 
sume that the interference level can be measured and thus 
user i does not have to calculate the accumulated inter- 
ference in each channel based on knowledge of the other 
transmitters. This entails some form of communication 
between user i's receiver node and transmit node. If the 
transmit node is responsible for running the algorithm, 
the receiver node must feedback this information either 
over the Internet or through a reverse wireless link. If 
the receiver node is responsible for processing the data, 
it must feed back the transmit the chosen transmit chan- 
nel and power. The transmit channel k* is then selected 
as 



k* =ma\Ui(k) 

kef 



and the power is set as 

Pi(k) -- 



p™ c {k) k = k* 
otherwise 



(5) 



(6) 



As nodes are discovered as candidates, they exchange 
radio parameters so that Equation [4] can be estimated. 
We assume each node knows its location and that path 
loss can be estimated by distance. The radio parameters 
which need to be exchanged between the candidates after 
they have located each other using the discovery proto- 
col is: (i) location, (i) current mode of operation (i.e. is 
a receiver or transmitter), (iii) corresponding transmit- 
ter/receiver (i.e. which node it communicates with) and 
(iv) which channel it uses. 

4.2 Convergence, Implementation Issues 
and Complexity 

We can show that the game formed by the utility function 
in Equation|4]is a generalized potential game. Let Jfi be 
the set of neighboring receiver nodes closest to transmit 
node i such that all channels are at least utilized by one 
node in jfi. Then we have the following result. 



6 



Proposition 1. The game formed by the utility function 
Ui(k) converges if for all i jYi C 

Due to space limitations we omit the proof in this pa- 
per, but an extended technical report with a detailed sec- 
tion on game theory with a detailed proof of the propo- 
sition is available ||25l . The criterion for convergence of 
the resource allocation algorithm was found using poten- 
tial game theory. Although potential games converge in 
most cases, they do not converge when updates are done 
in parallel where all nodes update their strategy simulta- 
neously, but as long as a subset does updates at different 
times it still holds. As it is unlikely that all nodes will 
perform updates at the same time, it is reasonable to as- 
sume the convergence result holds in practice. 

Another aspect to consider is the exchange of radio pa- 
rameters. This is determined by the utility function given 
in Equation |4] As we assume that the observed inter- 
ference at a receiver can be measured for all channels, 
each transmit node needs to obtain information about 
the channel gain between itself and all neighboring re- 
ceivers and the channel they receive on. As the number 
of neighbors can be large, obtaining channel gain infor- 
mation through pilot signals would introduce an exten- 
sive amount of signaling in the system. Instead we as- 
sume distance can be used to estimate channel gain. 

If we assume that position is given as GPS coordi- 
nates, we need 10 bits per dimension to be accurate 
within 100 meters, while we need 17 bits per dimension 
to be accurate within 1 meter. The number of bits needed 
for the channel number depends on the number of chan- 
nels available to the system. If this number is then 
we need [log 2 (|^|)l bits. In the US, the FCC has opened 
a subset of channels ranging from 2-51 for unlicensed 
use 1 8 1. In such a case we would need 6 bits to represent 
all channel numbers in the US. Note that if we assume 
position to be relatively static within the time of conver- 
gence, channel number is the only parameter that needs 
to be exchanged after the first iteration. 

In terms of complexity we can give a complexity 
bound per transmitter i as follows: Equation [3] performs 
one addition, one multiplication and one division per 
channel. Equation |4] performs |^/| additions and mul- 
tiplications per channel, and Equation [5] performs 
comparisons. Thus the complexity per transmitter per 
iteration is given as G($>\\8&§). 

5 Evaluation 

We evaluate the discovery protocol in terms of conver- 
gence time in different topologies, i.e., the time it takes 
for all nodes in the network to discover all their candi- 
date nodes, as well as the effect on resource allocation. 
The evaluation is performed on both randomly generated 



Network latency 


~ 50 ms 


Periodic update interval 


15 s 


Maximum coordination range 


50 m, 100 m 


Imp. nodes per exchange (K) 


100 (200) 


Imp. nodes table (M) 


Dynamic 




Max. 600 (800) 


Newscast table size (AO 


40 


Nodes in random topologies 


10 000, 1 000 000 


Nodes in Tynset topology 


5 826 


Nodes in Vinje topology 


4 074 


Nodes in Lillehammer topology 


26 036 



Table 2: Evaluation parameters for the discovery proto- 
col. 



topologies and topologies generated from statistical data 
from three municipalities in Norway. We perform sim- 
ulations for the smallest topologies in a an event based 
simulator and use a cycle based simulator to evaluate the 
largest topologies. Finally we perform the proposed re- 
source allocation algorithm on a subset of the results, 
showing that the knowledge gained from the discovery 
mechanism leads to an improvement in frequency allo- 
cation. 

5.1 Topologies 

The random topologies are created by randomly plac- 
ing node pairs in a given geographical area. The node 
pairs consist of two nodes positioned within radio range 
of each other. As an approximation, we assume that to 
communicate efficiently, the two nodes must not be far- 
ther apart than half the coordination range. The coordi- 
nation range used by the utility function is therefore set 
to twice the distance between the nodes. For example, 
two nodes placed 50 meters apart would use 100 meters 
as the coordination range in the discovery protocol. 

In the randomly generated topologies, the node pairs 
are distributed uniformly in a rectangular area. By vary- 
ing the size of this area, the average number of candi- 
date nodes each node has can be adjusted. We use these 
topologies to evaluate the general performance of the dis- 
covery protocol. 

To provide a more realistic challenge for the P2P 
protocol and the resource allocation algorithm, three 
topologies were generated based on population data in 
Norway|2|. We based the topologies on the municipal- 
ities of Tynset, Vinje and Lillehammer. The statistical 
data available is population per cell of 100 x 100 meters. 
As we assume one link per household, we divided and 
rounded up the population in each cell by 2.22, which is 
the average number of persons per household in Norway. 
Within each cell the node pairs are distributed randomly. 
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Figure 4: Topology for Tynset in 1 x 1 km resolution. 
Gray squares are areas with links. 305 links in the black 
square are used to evaluate resource allocation. 
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Figure 5: Topology for Lillehammer in 1 x 1 km resolu- 
tion. Gray squares represent areas with links. 

The generated topology for Tynset is shown in Figure 
[4] to illustrate the node distribution in the municipali- 
ties. As we can see, the topology contains several sep- 
arate clusters of nodes and has varying population den- 
sity. Figure [6] gives the distribution of candidate nodes 
in the P2P network for maximum coordination ranges 
50 and 100 meters. Most of the nodes in the topology 
have less than 10 candidates on average when the coor- 
dination range is limited to 50 meters. When the coor- 
dination range is increased each node sees more of the 
network, resulting in a more even distribution of candi- 
date nodes. A similar result can be seen for Lilleham- 
mer. Lillehammer is larger than Tynset and has a higher 
population density, with some nodes having up to 300 
candidate nodes when using a coordination range of 100 
meters. The Lillehammer topology is depicted in 1 x 1 
km resolution in Figure [5] 

We use a two dimensional version of the utility func- 
tion (see Equation[T| in the evaluation. 

5.2 Discovery protocol 

We start by evaluating the random topology with 10 000 
nodes and the three municipalities of Vinje, Tynset and 
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Figure 6: Distribution of candidate nodes in Lillehammer 
and Tynset. 

Lillehammer using an event base simulator written in 
Java. The simulator adds a network latency of 50 ms 
and we use a 15 second update interval for the discovery 
protocol. The simulation parameters are summarized in 
Table[5] The higher values for K and M are only used for 
Lillehammer. 

Initially each node knows about 40 other randomly 
selected nodes. Every node then attempts to improve 
its view of the network by exchanging information un- 
til they have found their candidate nodes, i.e. all other 
nodes with overlapping coordination ranges. This can be 
seen as a "warm start" of the network, as each node starts 
with a random sample of the network. 

Figure [7] presents the time it takes to reach a stable 
state with a varying candidate node degree with standard 
deviation in the random topologies with 10 000 nodes. 
The experiment is repeated 20 times for each point in 
the graph. As we can see, the time to reach stable state 
increases as the number of candidate nodes on average 
increases. In other words, the more candidate nodes a 
node has, the longer it takes to find all of them. With less 
than 20 candidate nodes it takes between two and three 
minutes before the network is stable, while with 160 can- 
didate nodes it takes w 800 seconds, or 13 minutes, to 
discover all of them. 

To evaluate how long it takes for a new node to become 
a part of the network we add new nodes after the random 
topologies have converged. At start up, the new nodes 
connect to the same node. They then receive a copy of 
its tables and proceed to connect to other random nodes 
in the topology. 

Figure [8] shows the average time it takes for the net- 
work to regain a stable state after 2, 20 and 200 nodes 
join the network. The experiment is repeated 20 times 
for each area size and number of new nodes. The stan- 
dard deviation is omitted for clarity. As we can see, the 
time it takes to reach a stable state increases with number 
of candidate nodes and the number of new nodes added. 
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Figure 7: Average time to reach stable state with 10 000 
nodes. 
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Figure 8: Average time to reach stable state with 10 000 
nodes after adding new links. 



If a single device is turned on in an area where it has 
less than 100 candidate nodes, it takes less than 120 sec- 
onds, or two minutes, on average before it knows about 
all the candidates. With 160 candidate nodes, the time 
increases to between three and four minutes. As the pro- 
tocol is intended to run on devices which are connected 
to the network for long time periods, we expect this start 
up delay to be acceptable. The delay could be reduced 
by using a shorter periodic update interval during start 
up for new devices, at the cost of higher bandwidth and 
processing requirements. 

Figure|9]shows the average time it takes to reach a sta- 
ble state for Tynset, Vinje and Lillehammer for a maxi- 
mum coordination range of 50 and 100 meters. The high 
number of candidate nodes in the center of Lilleham- 
mer increases the time the protocol takes to converge. 
When the coordination range is increased, Lillehammer 
has a higher increase in convergence time compared to 
Tynset and Vinje. This is because Lillehammer is more 
densely populated and the relative increase in the number 
of candidate nodes is higher. As we can see in Figure [6| 
many nodes end up with more than 100 candidate nodes 
with coordination range of 100 meters. This is higher 



Vinje Tynset Lillehammer Lillehammer+ 

Figure 9: Average convergence time in seconds. 

than the number of important nodes sent in each message 
(K = 100), which means that the full table of important 
nodes can not be transmitted in a single exchange. To 
verify this, we repeat the experiment with K = 200 and 
M = 800. The result is shown in the graph as Lilleham- 
mer+. 

5.2.1 Large-scale evaluation 

To show that the protocol scales to large topologies we 
implemented an iteration based simulator that enabled us 
to perform tasks within each iteration in parallel. Evalu- 
ations were performed with a random topology with 500 
000 link pairs (1 000 000 nodes). During the evaluation 
each node in the topology connects to a well-known node 
and we measure the number of iterations required before 
all nodes have found all their candidate nodes. This can 
be seen as a "cold start" of the network, where no other 
information is known in advance. We used a geographic 
area of 50 000 x 50 000 meters and a maximum coor- 
dination range of 100 meters. The topology has 16.79 
candidate nodes on average with standard deviation of 
8.94. The maximum candidate node degree is 63. 

The 50 000 x 50 000 meter random topology with 1 
000 000 nodes reached a stable state after 70 iterations. 
We can estimate that with 15 second update intervals the 
network would have converged in approximately 1 8 min- 
utes after a "cold start". 

5.3 Resource Allocation 

In 1 26 1 the availability of white space spectrum in the 
UK was investigated. Depending on the assumed device 
that is operating in the white spaces, it was found that 
between 10-15 channels of 8 MHz were available. To 
be conservative, we evaluate the resource allocation al- 
gorithm with |"jf | = 10 channels. Note also that resource 
allocation becomes more difficult with fewer channels. 
Each transmitter has a maximum transmit power of 100 
mW which is the defined maximum transmit power for 
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Number of Channels (|^|) 


10 


Maximum transmit power 


100 mW 


SINR requirement 


~ [7(1, 10) 


Thermal noise power 


10~ 8 w 


Path loss between node i and j, gj j 


d(i,j)-' 3 


Number of links considered 


305 


Maximum coordination range 


50m, 100m 


Maximum number of iterations 


20 



Table 3: Evaluation parameters for the resource alloca- 
tion algorithm. 




mobile white space devices as defined by the FCC |8I . 
The signal-to-interference-plus-noise ratio (SINR) re- 
quirement at each link is uniformly distributed between 
1 and 10. The rest of the simulation parameters are given 
in Table E3 

We evaluate the resource allocation algorithm in the 
light of both its quality as a resource allocation algorithm 
and how it progresses based on candidate node discovery. 
For all plots, the results are plotted against time as the 
discovery protocol finds more candidates. 

The resource allocation algorithm is compared to two 
simple allocation schemes: random and selfish. The 
random allocation scheme selects a frequency randomly 
from the set of available channels. The selfish scheme 
selects the channel with least interference and transmits 
with a power just large enough to achieve its SINR re- 
quirement. 

The total number of links in the Tynset topology is 
2913 (total number of nodes is 5826). As allocating re- 
sources for all 2913 transmitters would require enormous 
processing, we only evaluate the resource allocation al- 
gorithm based on an area of 1 x 1 km (as can be seen in 
Figure|4}. In this area there are 305 links to consider. The 
links are created from household locations as described 
in Section lBTI Note that the maximum distance between 
two linked nodes is half the coordination range. 

We begin by assigning a random channel to each of the 
2913 transmitters and transmit power equal to 100 mW. 
We then start the algorithm for the transmitters within the 
lxl km area. If nodes have receiving candidate nodes 
outside the lxl km grid, these are still considered in the 
frequency allocation utility function (Equation |4|i of the 
transmitters. 



Figure 10: Number of satisfied links as a function of time 
in Tynset with max. coord, range of 50 m. 



In Figure 10 the number of satisfied links is given in 
Tynset with coordination range of 50 meters and maxi- 
mum link distance of 25 meters. As expected, the ran- 
dom allocation scheme and selfish scheme behave the 
same over time, as their performance is not related to 
candidate node discovery. For our algorithm we see a 
clear increase in the number of satisfied links between 50 
to 100 seconds. The increase has two dependent causes: 




Figure 11: Number of iterations until convergence 
against time in Tynset with max. coord, range of 50 
m. 



1) From 50 to 100 seconds the node discovery mecha- 
nism goes from a state where very few nodes know all 
their candidate nodes to a state where most nodes know 



all their candidate nodes, as seen in Figure 11 2) From 



Figure 1 1 we also see that our algorithm goes from a high 
average of 18 to between 13 and 14 iterations. The aver- 
age of 1 8 is close to the maximum number of iterations 
(20) we set for the algorithm and is due to the algorithm 
not being able to converge when too little information is 
available about candidate nodes. When more informa- 
tion becomes available the algorithm always converges 
and the average goes down. Note that as some links are 
without candidate nodes there are fewer nodes with miss- 
ing candidate nodes than the total number of nodes in 



Figure 11 



In essence, this shows that in order for our algorithm 
to perform well, the algorithm must converge. For our 
algorithm to converge, we have to know a sufficiently 
amount of candidate nodes such that at least one node 
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Figure 12: Number of satisfied links as a function of time 
in Tynset with max. coord, range of 100 m. 




Figure 13: Number of iterations until convergence 
against time in Tynset with max. coord, range of 100 
m. 

utilizes each channel (by Proposition [TJ. This seems to 
occur at about 100 seconds. 

The importance of convergence is further emphasized 



in Figure 12 and Figure 13 where the maximum coordi- 



nation range is increased to 100 meters and the maximum 
transmit distance is 50 meters. As some links do not have 
candidate nodes, the number of initial nodes with miss- 
ing candidate nodes is also here lower than the total num- 
ber of nodes. Our algorithm starts off with performance 
similar to the selfish approach. At 50 seconds the perfor- 
mance drops below the random selection scheme, before 
it starts to improve and settles at around 100 seconds. 
The convergence plot in Figure[T3]is similar to Figure [TT| 
and we can again conclude that the algorithm performs 
well when the algorithm is able to converge. 



In Figure 12 the standard deviations at certain time in- 
stances are given. Standard deviation does not depend 
on time for the selfish algorithm and is just shown for 



the time instance at 10 s. For the selfish algorithm the 
standard deviation is 18.2, while ours starts at 18.1 and 
then gradually decreases to 8.2. Thus, as time progresses 
and more candidate nodes are discovered, the variation 
in performance also decreases. 

An interesting aspect of Figure [13] is the low perfor- 
mance between 50-80 seconds, which did not occur in 



Figure 10 The difference between the two cases is the 
probability distributions of distances between transmit- 
ters and receivers, which also increases the area within 
which nodes are considered relevant and thus increases 
the number of candidate nodes. We explain the drop in 
performance between 50-80 seconds as due to knowing 
some candidate nodes, but not enough and not the most 
important ones. If a transmitter bases its allocation on the 
knowledge of only a few far-away candidate nodes, then 
it is likely to disturb near-by candidate nodes not known 
to the transmitter. 

We also see that the difference between our algorithm 
and the selfish approach is larger in Figure [12] than in 



Figure 10 With a longer transmission range it is likely 
that more transmitters affect receiver i, while transmitter 
i might not affect these other receivers. Thus, it becomes 
more important to consider the well-being of other links 
when allocating frequency and power. If the distance be- 
tween transmitter i and receiver i went to 0, our algorithm 
would be equivalent to the selfish approach. 

An important aspect of distributed resource allocation 
schemes in large-scale networks is the domino effect, 
which is the change in the network due to a change in 
resource allocation at one or more nodes. It is desirable 
that this effect is as small as possible, i.e. as few nodes as 
possible should have to update their allocation due to a 
change at another node. In general, changing power lev- 
els at a transmitter only leads to a minor domino effect 
as a large change in one node affects the other nodes in 
a much smaller scale as the path loss exponent is larger 
than 2. Predicting the effect of frequency change is more 
difficult. 

To investigate this effect we let our original network 



converge (220 seconds in Figure 14 1 and then insert 10 
links in our lxl km area (230 seconds in FigurefB]). We 
measure the change over time as the number of links that 
change their frequency from one time interval to the next. 



From Figure 14 we see that approximately one third of 



the links change their frequency at the time the new links 
are inserted into the network. Only 2-3 links then change 
their frequency for each 10 second interval until after 290 
seconds where 13 links change their frequency. After 
this, the change goes down to just above 0. An interest- 
ing observation is that when the 13 links change their fre- 
quency at 290 seconds, the number of satisfied links also 
increases. It seems that this change is triggered by the 
discovery of new near-by candidate nodes both by new 
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Figure 14: Satisfied links and number of links changing 
frequency as a function of time. 10 links are added at 
230 seconds. 



generic overlay topology construction protocol described 
in lfl2l . The protocol described in this paper is however 
different due to the unique utility function, as well as the 
limited table sizes imposed by the bandwidth constraints. 
The goal of our utility function is not to construct an 
overlay, but to locate nodes with interfering radios. As 
radios have a tendency to form clusters around transmit- 
ters the resulting network may contain partitions. To find 
nodes across partitions our protocol also requires regular 
random samples of the topology to be provided by a peer 
sampling protocol 1 13 1. 

We have previously proposed using P2P for frequency 
allocation in [16| and 1 10 1, but the actual discovery pro- 
tocol has remained future work. 

To the best of our knowledge, there is at this point 
no other existing P2P protocols that enable discovery of 
nodes belonging to overlapping geographical areas. 



and old links. When these links adjust their frequency 
assignments the number of satisfied links increases. 

6 Related work 

Most of the research published so far support the needs 
of the IEEE 802.22 standard where the challenge is to 
identify local TV transmitters in the area. This has re- 
sulted in numerous publications on detection algorithms 
and corresponding false alarm and detection probability 
values. This work is relevant for our work since it will 
enable us to measure the noise and interference level for 
a receiver and use locally measured values rather than 
calculated values. Since we use the Internet for coordi- 
nation and exchange of radio parameters, the need for a 
coordinated common control channel is fortunately not 
required EJ, 151. Ifl5l 151. 

The idea of using P2P clients at base stations for es- 
tablishing direct communication via radio using central- 
ized control has been described in a patent claim Ell . 
This system is aimed at relieving traffic to base stations 
by establishing direct radio links between users. In an- 
other patent claim E4l . a spectrum manager and a base 
station controller is used to calculate radio parameters, 
collect data from the central database and sense the pres- 
ence of a TV signal. An Internet-based P2P mechanism 
to locate and distribute data from other clients which is 
the original part of our paper, is not addressed. Gossip- 
ing mechanisms have previously been used in cognitive 
radio networks, such as in (4). This approach is differ- 
ent from ours in that it is based on effectively distributing 
spectral sensing information, not discovering other nodes 
over the Internet. 

The protocol used for exchanging and merging tables 
of important nodes can be seen as a variation of the 



7 Conclusion 

In this paper, we have presented a decentralized pro- 
tocol and architecture for discovering interfering radio 
devices over the Internet. The protocol has low mem- 
ory and bandwidth requirements, making it suitable for 
running on devices with limited hardware. We have 
shown through simulation that the protocol scales to net- 
works with up to 1 000 000 nodes and evaluated its per- 
formance in topologies generated from population data 
from three municipalities in Norway. Finally, we have 
adapted and evaluated an existing resource allocation al- 
gorithm which utilizes the local knowledge gained from 
the discovery protocol. The evaluation reveals that local 
knowledge improves the number of satisfied users. We 
expect that future research on resource allocation algo- 
rithms with local knowledge will improve the gains fur- 
ther. 

Some of the techniques used to realize the system de- 
scribed in this paper are adapted from previously pub- 
lished works. A key contribution of the paper is thus how 
these existing components can be combined to enable a 
large-scale distributed DSA architecture. 

The simulator and the data sets used in this paper will 
be made publicly available. 
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